iMapReduce: Incremental MapReduce for Mining Evolving Big Data
نویسندگان
چکیده
As new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. In this paper, we propose i2MapReduce, a novel incremental processing extension to MapReduce, the most widely used framework for mining big data. Compared with the state-of-the-art work on Incoop, i2MapReduce (i) performs key-value pair level incremental processing rather than task level re-computation, (ii) supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and (iii) incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. We evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics. Experimental results on Amazon EC2 show significant performance improvements of i2MapReduce compared to both plain and iterative MapReduce performing re-computation.
منابع مشابه
iMapReduce: Incremental Iterative MapReduce
Cloud intelligence applications often perform iterative computations (e.g., PageRank) on constantly changing data sets (e.g., Web graph). While previous studies extend MapReduce for efficient iterative computations, it is too expensive to perform an entirely new large-scale MapReduce iterative job to timely accommodate new changes to the underlying data sets. In this paper, we propose iMapReduc...
متن کاملIncremental and Iterative Map reduce For Mining Evolving the Big Data in Banking System
Big data is a broad term for datasets so large and complex that the traditional data processing applications are inadequate, so i2mapreduce based framework for incremental and iterative computations are done in big data. State level processing computation easily retrieve the data and also time consuming. Incremental and iterative mapreducemapreduce is the most widely used big data processing to...
متن کاملA Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data
Nowadays, the volume of data is growing at an nprecedented rate, big data mining , and knowledge discovery have become a new challenge in the era of data mining and machine learning. Rough set theory for knowledge acquisition has been successfully applied in data mining. The MapReduce technique, received more attention from scientific community as well as industry for its applicability in big d...
متن کاملA Novel Mapreduce Lift Association Rule Mining Algorithm (Mrlar) for Big Data
Big Data mining is an analytic process used to discover the hidden knowledge and patterns from a massive, complex, and multi-dimensional dataset. Single-processor's memory and CPU resources are very limited, which makes the algorithm performance ineffective. Recently, there has been renewed interest in using association rule mining (ARM) in Big Data to uncover relationships between what seems t...
متن کاملLarge-scale incremental processing with MapReduce
An important property of today’s big data processing is that the same computation is often repeated on datasets evolving over time, such as web and social network data. While repeating full computation of the entire datasets is feasible with distributed computing frameworks such as Hadoop, it is obviously inefficient and wastes resources. In this paper, we present HadUP (Hadoop with Update Proc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015